A Lyapunov-Based Methodology for Constrained Optimization with Bandit Feedback
نویسندگان
چکیده
In a wide variety of applications including online advertising, contractual hiring, and wireless scheduling, the controller is constrained by stringent budget constraint on available resources, which are consumed in random amount each action, stochastic feasibility that may impose important operational limitations decision-making. this work, we consider general model to address such problems, where action returns reward, cost, penalty from an unknown joint distribution, decision-maker aims maximize total reward under B cost time-average penalty. We propose novel low-complexity algorithm based Lyapunov optimization methodology, named LyOn, prove for K arms it achieves square root KBlog(B) regret zero constraint-violation when sufficiently large. The low computational sharp performance bounds LyOn suggest Lyapunov-based design methodology can be effective solving bandit problems.
منابع مشابه
Stochastic convex optimization with bandit feedback
This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x ∈ X . The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm’s query points...
متن کاملTowards Minimax Policies for Online Linear Optimization with Bandit Feedback
We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order √ dn logN for any finite action set with N actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous √ d factor compared to previous works, and gives a regret bound...
متن کاملA Robust Knapsack Based Constrained Portfolio Optimization
Many portfolio optimization problems deal with allocation of assets which carry a relatively high market price. Therefore, it is necessary to determine the integer value of assets when we deal with portfolio optimization. In addition, one of the main concerns with most portfolio optimization is associated with the type of constraints considered in different models. In many cases, the resulted p...
متن کاملOnline Stochastic Optimization under Correlated Bandit Feedback
In this paper we consider the problem of online stochastic optimization of a locally smooth function under bandit feedback. We introduce the high-confidence tree (HCT) algorithm, a novel any-time X -armed bandit algorithm, and derive regret bounds matching the performance of existing state-of-the-art in terms of dependency on number of steps and smoothness factor. The main advantage of HCT is t...
متن کاملStochastic Linear Optimization under Bandit Feedback
In the classical stochastic k-armed bandit problem, in each of a sequence of rounds, a decision maker chooses one of k arms and incurs a cost chosen from an unknown distribution associated with that arm. In the linear optimization analog of this problem, rather than finitely many arms, the decision set is a compact subset of R and the cost of each decision is just the evaluation of a randomly c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i4.20285